I9EA074102 


Some  Rank-Order  Tests  for  Trend  in  a Set  of  Correlated  Means. 


Ardie  ytubin 


Walter  Reed  Army  Institute  of  Research 


Box  2086, 

Ft.  Benning,  Go. 


Uashington  12,  D.  C 


In  many  experiments  the  major  interest  is  not  in  tho  amount  of 
differonco  caused  by  tho  experiment,  but  the  rank-order  which  rosults. 
This  is  especially  true  when  n successive  measurements  are  made  on  tho 


samo  m subjects,  rhere  tho  subjocts  arc  being  oxposod  to  increasing 
amounts  of  work,  sleep  loss,  etc.  For  such  studios,  tho  null  hypothesis 


is  that  thoro  is  no  trend;  tho  usual  alternative  hypothesis  is  that  tho 


means  aro  a monotonic  function  of  time 


The  motivation  for  this  papor  comes  mainly  from  A.  R.  Jonckhccro's 
recent  development  of  a general  non-parametric  test  against  ordored 
alternatives  which  can  bo  used  to  test  tho  hypothosis  that  a set  of 
corrolatod  means  has  a predicted  rank-ordor  (1954a,  1954b).  Jonckhcoro 
uses  a statistic  P (based  on  tho  Kendall  rank-ordor  correlation,  tau) 
which  is  the  sum  of  K./- trail1 s values,  computod  botwoon  the  predicted 

rank-ordor  and  the  obsorved  rank-order  for  each  of  the  m subjects. 

This  paper  will  proposo  an  alternative  statistic  J (basod  on 
Spoarmnn'  ; rank-ordor  correlation,  rho)  which  can  be  shown  to  be  moro 
poworful  than  P in  somo  special  cases.  J is  tho  sum  of  tho  S^(d^) 
values  computed  botwoon  tho  obsorvod  ranking  of  the  n scoros  for  each 


subjoct  and  tho  hypothetical  ranking  of  tho  n scoros 

distribution  statement  a 

Approved  tor  public  rele«M|  m /\  A Q 1 

Distribution  Unlimited  / H \J  X 


Dtp, 


DEPARTMENT  OP  THE  ARMY 

ARI  FIELD  UNIT.  BENNING 
ARMY  RESEARCH  INSTITUTE  FOR  THE  BEHAVIORAL 
P.O  BOX  2086.  FORT  BENNING.  GEORGIA 


8 August  1979 


SUBJECT:  Shipment  of  Documents 


Defense  Documentation  Center 
Cameron  Station 
Alexandria,  VA  22314 
ATTN:  Selection  & Cataloging 


The  Documents  in  these  shipments  are  approved  for  public  release.  The 
distribution  is  unlimited. 


FOR  THE  CHIEF 


ALEXANDER  N\COLINI 
Major,  Infantry 
R&D  Coordinator 


1 


The  assumptions  underlying  such  rank-order  tests  as  the  P and  J « 

a 

procedures  will  be  discussed  in  detail  and/robust  analog  of  J will  be 


proposed*  Some  experimental  designs  trill  be  suggested  for  which  these 
trend  tests  are  appropriate. 


J is  related  in  a rather  simple  tvay  to  other  non-parametric  tests. 

I 

It  has  already  been  noted  that  J is  simply  the  Spearman  rank-order 
analog  of  P.  Since  Spearman's  rho  is  more  sensitive  than  Kendall's  tau, 

J is  more  sensitive,  and  sometimes  more  powerful,  than  P.  (The  relative 
power  of  the  two  tests  will  be  discussed  in  detail  later.)  On  the  other 
hand,  for  small  values  of  n,  P can  be  approximated  by  the  normal  distri- 
bution much  better  than  J. 

J is  exactly  equivalent  to  the  ono-tail  binomial  sign  tost,  when 
n s 2.  So  the  J tost  can  be  regarded  as  an  extension  of  the  sign  test 
to  the  case  where  n > 2. 

The  average  rank-ordor  correlation  betweon  tho  m rankings  end  tho 
predicted  rank-order  is  given  by 


X can  be  regarded  as  a spocial  caso  of  Kendall' s V.  coefficients*  r is 
a linear  function  of  tho  average  3 poor nan  rank-order  intorcorrclation 


of  m rankings  with  ono  another,  whoroas  K is  tho  average  correlation  of 


, 

\ 


distributed  for  large  valuos  of  n or  n. 

Assumption,  li  For  any  subject,  oil  permutations  of  the  n 
socres  are  equally  likely. 

Assumption  2:  The  rank-order  for  any  subject  is  statistically 
independent  of  the  rank-order  for  any  other 
subject. 

Kendall  (1948)  has  shown  that  Assumption  1 loads  to  a normal 
distribution  of  S^(d2)  when  n is  large.  Therefore  when  n is  large, 

J is  normally  distributed,  since  the  sum  of  normally  distributed 
variables  is  itself  normally  distributed. 

Assumption  2 leads  to  a normal  distribution  for  J when  m is  large, 
since,  by  the  well-known  Central  Limit  Theorem,  a sum  of  m independent 
random  variables  will  tond  to  normality  as  m increases. 

hi.  G.  Kendall  (1948)  gives  the  mean  of  S^(d2)  as 

(3)  ^(n3-n), 

and  tho  varianco  as 

(.if  (p)  • 

Since  tho  mean  of  a sum  is  equal  to  tho  sum  of  the  moans,  tho  moon 
of  J is 

(5)  n:f(n3-n). 

Sinco  tho  varianco  of  a sum  of  uncorrolatod  variables  is  equal  to 


L 


tho  sum  of  the  variances  it  follows  that  tho  variance  of  J is 


Lubin 


The  value  of  S^(d2)  is  always  oven,  so  tho  interval  between  adjacent 
values  of  J is  always  two.  Therefore  tho  correction  for  continuity  is 


From  the  above , it  follows  that  for  largo  values  of  m and  n 


normally  distributed,  rith  a mean  of  zero  and  a standard  deviation  of 


uirty,  when  tho  null  hypothesis  is  true.  Sinco  the  alternative  hypothosis 
is  directional,  only  negative  values  of  z noed  be  tested  for  significance; 
tho  null  hypothesis  is  accepted  automatically  if  z is  positivo  or  zero. 


The  large  sample  tost  in  terms  of  K is 


Since  it  is  a one-tail  tost,  only  positive  values  of  z noed  be  testod  for 


significance;  the  null  hypothosis  is  accepted  automatically  if  z is 


negativo  or  zero 


The  distribution  of  J for  small  valuos  of  m and  n tends  to  normality 
much  moro  slowly  than  doc3  that  of  P,  Tho  distribution  of  Sp(d^)  is 
;,y  vaotric  but,  as  Kendall  (1948)  has  pointed  out,  has  the  unusual  proporty 


valuos  closo  to  tho  mean  arc  not  necessarily  more  frequent  than 


values  further  from  the  moan.  This  gives  the  distribution  curve  a poou- 


liar  sarllko  profile,  Sinco  tho3o  reversals  occur  mo-tly  noar  tho  mean, 


the  normal  curve  gives  a better  fit  to  the  tails  of  tho  cumulative 


Lubin 


5 


distribution  than  it  does  near  the  mean.  Fortunately,  in  tests  of  signi- 
ficance , we  are  mainly  interested  in  the  fit  near  the  tails. 

In  general,  J reflects  these  properties  of  S^(d2)  although  the 
reversals  disappear  rather  quickly  as  n increases.  Table  7 gives  the 
probability  distribution  for  J where  the  normal  curve  gives  a poor  fit 
to  the  tail  of  the  distribution.  Let  us  arbitrarily  define  the  tail  of 
a distribution  as  that  portion  where  the  cumulative  probability  is  less 
than  .100.  Then,  as  a rough  guide,  we  can  say  that  when  nn  is  greater 
than  12,  the  maximum  error  in  the  distribution  tail  will  be  .004  or  less. 
The  maximum  error  near  the  mean,  when  mn>12,  is  .009  or  less.  Distribu- 
tion tables  are  given  in  this  article  for  all  cases  where  the  maxinum 
tail  error  is  .004  or  greater,  except  '’hero  m=l  or  n = 2.  T!hen  n = 2, 

J is,  of  course,  the  binomial  sign  test;  when  n=l,  J becones  equal  to 
Sj.(d2).  Adoquate  tables  for  theso  cases  can  be  found  olsowhere  (o.g., 

E.  S.  Pearson  and  H.  0.  Hartley,  1954,  pp.  186,  211  and  238). 

To  fix  our  ideas  let  us  apply  the  J test  to  a learning  exanplo  used 
by  Jonckhooro. 


Insert  Table  1 about  hero 


In  Table  2,  tho  rank-ordor  scores  for  the  values  in  Tablo  1 are 
given.  Tho  null  hypothesis  is  that  thoro  is  no  trend.  Tho  O.iormtive 


Insort  Tablo  2 abc  ut  hero 


I 


Lubin  6 

hypothesis  is  that  tho  number  of  errors  decrease  from  the  first  learning 
trial  to  tho  soventh. 

Hero  there  arc  five  subjects  and  sovon  scores,  so  n - 5 and  n = 7. 
Since  nn  > 12,  the  largo  sample  tost  will  bo  used.  Tho  oxpoctod  mean  and 
standard  deviation  arc 

p = 2*4^  = 280, 

6 


<J  z n(nj:  P-l  Vm(n  -J-l)  = 51.121. 


The  normal  deviato  is  therefore 


z 


J-U+l 

a 


-2.84  . 


Thus  the  result  is  significant  beyond,  the  .01  confidence  level.  If 
z had  been  positive,  the  null  hypothesis  would  have  boon  accoptod. 

Ho  can  also  calculate  the  normal  deviato  for  Jonckhccrc's  P test. 
Tho  mean  of  P is  nn(n-l)/4;  tho  variance  is  mn(n-l) (2n+ 5)/72;  and  the 
correction  for  continuity  is  l/2.  (The  sign  of  P is  always  opposite  to 
tnat  of  J . ) 

In  this  case  J is  more  powerful  than  P.  As  will  be  shown  later, 
this  is  because  of  its  groator  sensitivity. 

Now  let  us  apply  tho  J test  to  a ease  whore  n and  n arc  small.  In 
Table  3 a fictitious  example  i3  given  where  the  number  of  seconds,  of 
alpha  EG  wave  frequency  during  a 60-socond  intorval,  has  boon  recorded 
for  each  subject  during  four  succo33ivo  diys  of  sleep  .•.cac,  In  Tablo  4, 


i 


the  rank-order  valuos  aro  given.  The  null  hypothesis  is  that  there  is 


Lubin 


Insert  Tables  3 and  4 about  hore 


no  trend.  The  alternative  hypothesis  is  that  EEG  waves  of  the  alpha  typo 
appear  loss  frequently  as  sleep  loss  increases. 

Sinco  thcro  aro  throe  subjects  and  four  scores,  mn  = 12.  From 
Tablo  7 we  find  that  an  observed  valuo  of  J S 12  (for  m s 4 and  m a 3) 
will  occur  with  a probability  of  .045,  so  tho  average  Spoarman  ranlc-ordor 
coefficient  of  .60  is  significant  at  the  .05  level. 

An  mn  value  of  12  is  just  within  the  arbitrary  borderline  we  have 
drawn  for  small  samples.  How  much  error  would  thorc  be  if  the  normal 


approximation  wero  used? 


_ J-p-*-i  _ , r 
Z ■ ■ - -1.  i 


This  z corresponds  to  a one -tail  probability  of  .0446  as  compared  to  the 
exact  probability  of  .045.  Clearly,  the  normal  approximation  is  nearly 
perfect  here. 

Using  Jonckheore's  trend  tost, 

P-j-jmn(n-l)  . 12-9.5  . 98o 

2 5 V“<“-i)(2n-5)  2-550 

Thus,  using  tho  P test  in  this  ease  would  have  lod  to  the  acceptance 
of  tho  null  hypothesis.  This  example  was  doliboratcly  constructed  to  3how 
that  a Large  difference  can  occur  between  the  J test  and  tho  P tost. 

(Lator  we  will  construct  an  examplo  where  P is  more  powerful  than  J.) 


Lubin 


8 


Alternatives  to  the  J Test 

The  rank-order  tests  for  trend  seem  to  have  no  parametric  analogs. 

If  re  are  rilling  to  specify  the  exact  differences  that  theoretically 
should  exist  between  the  n correlated  means,  Hotelling's  T test  (Hotelling, 
1931)  enables  us  to  compute  the  likelihood  that  the  observed  set  of  n means 
could  be  a sample  from  the  theoretical  universe.  But  this  in  general 
places  too  great  a burden  upon  the  experimenter.  Ordinarily,  he  can  only 
specify  the  direction  in  which  the  subject's  scores  should  move,  not  the 
amounts.  So  there  is  no  multivariate  normal  statistic  which  is  exactly 
analogous  to  the  J test.  A series  of  t-tests  could  be  used,  but  the  signi- 
ficance level  would  be  hard  to  determine  for  any  given  sequence  of  results. 

VIhat  might  be  called  a quasi-parametric  test  of  trend  has  been  pro- 
posed (personal  communication)  by  N.  Liantel  of  the  National  Institutes  of 
Health. 

Lot  Yij  be  the  score  of  the  i^h  subject  on  the  jth  treatment, 
and 

X^  be  the  theoretical  rank  of  the  j^*1  treatment. 

Thon;  b^,  the  slope  of  Y on  X,  can  be  calculated  in  the  usual 

least-squares  manner  for  the  ith  subject.  If  the  null 
hypothesis  is  correct,  the  average  value  of  b is  zero; 
if  the  alternative  hypothesis  is  correct,  the  average 
is  positive. 

r.’hon  the  distribution  of  observed  b's  is  normal,  a ono-tail  t tost 
with  (m-l)d.f  is  most  powerful,  otherwise  a pormutation  or  sign  tost  can 


be  used.  The  fundamental  assumptions  here  are:  (1)  X is  a positive  linear 
function  of  X,  (2)  the  n scores  for  a subject  are  statistically  independent 
of  the  scores  for  any  other  subject. 


Clearly,  the  first  assumption  is  not  absolutely  necessary.  If  the 
theoretical  trend  is  any  monotonic  function,  Mantel's  test  will  be  useful 
because  it  allots  us  to  compare  the  strength  of  trends.  The  non-parametrio 
tests,  J and  P,  are  tests  of  consistency  rather  than  amount  of  change. 
Suppose  T'O  are  trying  to  compare  the  effect  of  sleep  loss  on  simple  and 
disjunctive  reaction  time.  Both  measures  may  give  us  a K value  of  unity, 
and  yet  the  incrcaso  in  reaction  time  could  be  considerably  greater  for 
one  task. 

Another  alternative  which  preserves  the  motric  is  K.  A.  Fisher's 
permutation  tost.  Here  it  is  assumed  that  for  any  subject,  his  set  of  n 
observed  scores  arc  puroly  a chanco  arrangement  and  any  other  permutation 
of  the  n observed  scores  is  equally  likoly.  This  is  clearly  Assumption  1 
with  the  addition  that  the  set  of  observed  scoros  is  an  adequate  repre- 
sentation of  the  universe.  In  general  Fishor's  permutation  procedure  rc- 
cjiircs  the  computation  of  (n'.)111  sets  of  n correlated  means.  The  J and  P 


tests  may  be  viewed  as  permutation  tests  on  rank-order  transformations  of 
the  raw  scores. 

Presumably  there  are  other  non-paranctric  trend  tests  that  could  bo 
used  in  place  of  J or  P.  For  example,  the  set  of  n correlated  moans 
could  be  rank-ordered  and  a Kendall  tau  or  Spearman  rho  computed.  Essentially, 
this  is  a rank-order  trend  test  where  n is  always  unity.  As  Jonckhecro 
has  pointed  out,  such  a test  would  always  be  loss  powerful  than  tho  J and 


P tests  which  take  account  of  the  size  of  m. 

Vie  have  several  times  previously  noted  that  the  J test  seems  in 
some  cases  to  be  more  powerful  than  the  P test.  This  results  from  the 
fact  that  Spearman's  S^(d2)  is  usually  a more  sensitive  measure  of  rank- 
order  correlation  than  Kendall's  P.  For  example,  take  the  rank-ordors 
1342,  1423,  2314,  3124,  and  2143.  All  of  these  have  the  same  Kendall  P 
value,  4.  If  no  compute  Spearman's  S^(d2),  then  four  of  those  rank-orders 
ond  up  with  the  same  value,  6}  but  the  rank-order  2143  has  an  S^(d2)  of 
4 (indicating  a Spearman  rho  of  .60  comparod  with  .40  for  the  other  rank- 
orders).  It  is  those  discriminations  which  cause  the  sawtooth  profile 
for  the  distribution  of  rho. 

The  relation  between  Spearman's  rho  and  Kendall's  tau  is  rather 
complex.  "I hen  n : 2 or  3,  they  are  identical  (except  for  a linear  trans- 
formation). Mien  n = 4,  tau  is  a single-valued  function  of  rho,  but  rho 
is  not  a single-valued  function  of  tau.  Therefore,  when  n = 4,  rho  is 
more  powerful  than  tau.  Vllrw.  n is  greater  than  4,  it  is  still  true  that 
there  arc  many  more  possible  values  of  rho  than  tau,  but  tau  is  no  longer 
a single-valued  function  of  rho.  There  arc  some  relatively  infrequent 
occasions  when  there  are  several  values  of  tau  for  each  value  of  rho.  So, 
although  rho  will  be  more  powerful  than  tau  for  most  non-null  universes 


than  can  bo  constructed,  it  is  not  uniformly  more  powerful.  (I  am 
grateful  to  U,  P.  Schutzcnbcrgcr  of  the  iiassachu3ctts  Institute  of 
Technology  for  his  help  in  working  out  thi3  relation.) 


\ 


This  means  that  it  is  possible  to  construct  a non-null  universe 
where  P ’.Till  bo  moro  powerful  than  J.  For  example,  when  n : 5,  take  a 


. Lubin  11 


population  where  the  rank-order  i3  always  23451.  The  Spearman  rho  equals 
zero,  rhcrcas  the  sane  rank-order  gives  a value  of  .20  for  Kendall's  tau. 
Clearly,  J will  never  be  significant  no  natter  how  large  m becomes  whereas 
P will  be  significant  at  the  5 percent  level  for  m 2l13. 


/It  is  of  some  interest  to  ascertain  whether  a statistic  such  as 
S(d^),  where  q is  even,  might  prove  more  powerful  than  either  S(d^)  or  P. 

It  turns  out  that  S(d^)  gives  results  equivalent  to  S(d^)  for  n s 2,  3, 

4,  and  5,  but  does  introduce  additional  discrimination  for  n>5.  The  addi- 
tional discrimination  seems  to  be  slight  and  this  approach  has  not  been 
pursued  further./ 

In  general  then,  there  is  no  exact  parametric  analog  for  the  J test, 
and  J wilx  usually  (but  not  always)  be  more  powerful  than  P. 


Appropriateness  of  the  J Test 

Under  what  circumstances  should  the  J test  be  applied?  To  answer 
this  question,  let  us  re-examine  the  two  basic  assumptions  and  consider 
what  alternative  hypotheses  we  ^ant  to  test. 

(1)  For  any  subject,  all  permutations  of  the  n scores  are 
equally  likely. 

(2)  The  rank-order  for  any  subject  is  statistically  independent 
of  the  rank-order  for  any  other  subject. 

Assumption  2 need  not  trouble  us.  Tith  the  exception  of  such 
obvious  cases  as  siblings,  or  matched  groups,  one  subject's  rank-ordor  can 
always  bo  expected  to  be  independent  of  any  other  subject’s  rank-ordor  if 
the  null  hypothesis  is  true  and  the  n treatments  have  no  of foot. 


But  Assumption  1,  though  simply  stated,  raises  a considerable  number 


' , T'i’bi:. 


of  difficulties,  tho  same  ones  that  ari3o  in  using  analysis -of -variance 
for  a successive  measurement  design.  Thoso  difficulties  arc  of  two  sorts, 
experimental  and  statistical. 

Experimental  difficulties  occur  if  there  is  any  systematic  result  of 
the  measurement  itself  (such  as  a practice  effect).  (E.g.,  if  thero  is 
learning  or  fatigue,  then  even  when  the  n treatments  do  not  differ,  thero 
will  be  a trend  in  the  data.)  Prom  tho  experimental  dosign  point  of  view, 
it  is  clear  that  somehow  tho  '’carry-over”  effect  of  loarning,  fatigue, 
etc,,  must  be  dealt  with  bofore  tho  effect  of  tho  treatments  can  bo 
assessed.  Two  common  designs  for  accomplishing  this  are  tho  "plateau" 
procedure  and  tho  use  of  matched  controls. 

The  first  procedure  involves  testing  the  subject  until  his  scores 
level  off  and  a steady  state  is  reached.  There  arc  several  objections 
to  this:  (l)  considerable  time  is  required,  (2)  many  plateaus  can  be 
encountered  in  the  learning  curve  of  one  subject  and  it  is  difficult 
if  not  impossible  to  state  when  the  ultimate  level  has  been  reached, 

(3)  sometimes  it  is  oxactly  the  effect  of  the  treatments  on  the  speed 
of  learning  that  we  wish  to  ascertain. 

The  "matched  control"  design  uses  control  subjects,  paired  with 
each  experimental  subject,  who  experience  all  of  the  n succossivc  measure- 
ments without  the  treatments.  A difference  score,  : E^-Ci,  can  then 
be  computed  for  each  pair  on  the  1^  measurement.  The  expected  value  of 
thi3  difference  score  will  bo  constant  if  tho  treatments  have  no  offcct 
on  the  scores  of  the  experimental  subject. 

A major  disadvantage  of  this  dosign  i3  that  it  assumes  the  control 


Lubin 


13 


rill  have  the  same  shape  of  learning  (or  fatigue)  curve  that  the  experi- 
mental subject  would  have  had.  If  the  matching  is  poor,  considerable 
error  variance  can  be  introduced  via  this  assumption.  However,  such 
errors  will  not  bias  the  trend  test  in  the  population. 

It  is  customary  to  use  what  are  called  "balanced"  designs.  Thus, 
if  there  are  two  treatments  A and  B,  Group  1 would  receive  the  treat- 
ments in  the  order  AB;  Group  2 would  have  the  order  BA.  In  general,  the 
experimental  design  and  assumptions  are  those  of  the  Latin  square.  In 
particular,  it  is  assumed  that  there  is  no  interaction  between  treat- 
ments and  order.  Suppose,  however,  that  under  Treatment  A,  scores  are 
depressed  initially,  but  learning  prococds  much  moro  swiftly  and  a 
higher  ultimate  level  is  reached  than  under  Treatment  B.  Thon  if  a 
groat  many  trials  aro  given  for  each  treatment,  Treatment  A will  be 
judged  to  be  superior.  By  shortening  the  number  of  trials,  tho  advan- 
tage for  A can  bo  wiped  out,  and  even  roversod.  Clearly,  any  interaction 
between  treatment  and  ordor  can  load  to  bias.  And  the  very  cxistoncc  of 
cffocts  such  as  learning,  boredom,  fatiguo,  etc.,  makes  it  unsafo  to 
assume  that  there  will  bo  no  interaction  betwocn  such  offocts  and  tho 
treatment . 

Still  anothor  design  difficulty  arises  if  any  of  the  treatments 
has  residual  effoetj  o.g,,  some  drugs  havo  a physiological  offoct  that 
lasts  for  days  and  sometimes  weeks , One  usual  method  is  to  allow  suffi- 
cient time  to  olapsc  between  trontmonts  so  that  tho  residual  offccts 
have  boon  oliminatod.  A ain,  if  the  residual  effects  cannot  bo  elimi- 
nated, balancod  designs  aro  of  doubtful  value.  In  gonoral,  psychological 


Lubin 


14 


experiments  cannot  meet  the  requirements  that  the  treatment  effects  be 
constant  regardless  of  order  and  that  the  residual  cffocts  be  zero. 

Unless  the  treatments  arc  the  practice  effects  of  the  successive 
measurements  thcnsolvcs  (as  in  Jonckhcoro' o example),  "matched  control" 
design  is  strongly  rccommondcd  for  use  whenever  successive  measurements 
arc  involved. 

Unfortunately,  this  still  docs  not  dispose  of  Assumption  2. 
Clearly,  Assumption  2 involvos  statistical  restrictions  on  tho  raw  score 
distributions.  For  example,  if  one  distribution  is  rectangular  and  tho 
other  is  3 leered,  Assumption  2 does  not  hold.  Hhnt  conditions  must  be 
placed  on  tho  n-variato  distribution  of  raw  3coros  such  that,  when  tho 
n treatments  havo  no  of foot,  the  assumption  that  "all  permutations  ore 
equally  likely"  holds  true? 

Lehmann  and  Stein  (1949)  mention  a particular  kind  of  n-variato 
symmetry  in  rhich  tho  univariate  distributions  of  the  n measurements 
arc  identical,  tho  bivariate  distributions  of  any  two  treatment  vari- 
ables arc  identical,  the  trivariatc  distributions  of  any  three  treatment 
variables  arc  identical,  etc.  For  on  n-variato  normal  distribution  this 
inplio3  equal  means,  oqual  variances,  and  a constant  intcrcorrolation. 

If  any  multivariate  distribution  of  raw  scores  exhibits  this  3ymmotry, 
then  of  necessity  the  EXP  (oqually-likoly-pormutations-ossunption)  will 
hold.  (I  am  indebtod  to  Professor  VI.  A.  T.  all  is  of  tho  Univorsity  of 
Chicago  and  Doctor  J.  R.  hosonblatt  of  the  National  Buroou  of  Standards 
for  calling  my  attention  to  this  particular  kind  of  "multivariate" 
symmetry  and  for  aid  in  interpreting  tho  results.) 


Lubin  15 

E.  L.  Lehnann  (personal  communication)  has  pointed  out  that  although 
this  particular  multivariate  symmetry  is  sufficient  for  the  EIP,  it  is 
not  necessary.  For  example,  if  we  take  any  two  non- identical  independent 
distributions  that  are  symmetric  about  the  same  mean,  the  ELP  will  hold, 
even  though  the  univariate  distributions  are  different.  (Unfortunately, 
this  doesn’t  extend  to  the  case  of  three  or  more  symmotric  distributions.) 
So  far  as  I know,  a set  of  conditions  which  would  bo  both  necessary  and 
sufficient  for  the  ELP  has  not  been  devised. 

This  requirement  of  the  ELP  is  a very  severe  restriction  upon  the 
use  of  the  J tost.  Essentially,  J is  a test  of  the  homogeneity  of  the 
treatment  variables.  If  J is  significant,  it  is  assumed  that  this  is 
due  to  differences  between  the  n moans,  but  tho  significance  of  J may  bo 
duo  to  differences  in  tho  varianco,  third  nouont,  fourth  moment,  etc. 

Any  deviation  from  "multivariato"  symmetry  could  conceivably  cause  a 
significant  J.  This  difficulty  is  not  confined  to  the  J test  but  scorns 
to  apply  to  all  rank-order  tests  for  differences  between  moans;  c.g., 
Kendall’s  ",  KruslcalJlallis  H,  Fcstingor’s  d,  etc.  Tho  domand  that  all 
treatment  variables  have  a constant  dependency  upon  one  anothor  is 
strikingly  similar  to  tho  demand  in  two-way  analysis  of  variance,  that 
the  treatment  variables  have  a constant  intcrcorrclation, 

h'hat  can  bo  dono  if  one  suspects  that  tho  ELP  docs  not  hold?  At 
least  two  different  paths  aro  possible.  Ono  solution  is  a procoduro  to 
test  whether  a significant  J is  duo  to  the  differences  between  the  n 
moans  or  to  tho  deviations  from  ELP.  Tho  other  way  is  to  construct  a 
robust  test  which  docs  not  domand  ELP. 


A Preliminary  Test  for  J 


Let  us  take  the  raw  scores  for  each  treatment  and  convert  them  into 
deviations  from  the  treatment  mean.  Then  the  mean  of  the  deviate  scores 
for  oach  treatment  is  zero.  Non  rank-order  the  deviate  scores  for  each 
subject  in  the  usual  nay  and  perform  the  J test.  If  J is  significant, 
it  must  bo  due  to  deviations  from  the  ELP  since  the  troatnent  means  are 
oqual,  For  convenience  I shall  call  this  preliminary  procedure  the  A 
tost. 

Lot  us  apply  the  A tost  to  Jonckhoere,s  oxample.  This  has  boon  done 
in  Tables  5 and  6.  Since  the  z from  Table  6 is  positive  and  loss  than 


Insert  Tables  5 and  6 about  horo 


unity,  no  concludo  that  the  significant  z found  from  the  data  in  Tablo  1 
is  not  duo  to  deviations  from  ELP. 

A Robust  Analog  of  J 

altornativo  to  a preliminary  tost  would  bo  to  alter  tho  J tost 
so  that  it  "ould  bo  robust  to  doviations  from  tho  ELP.  (I  am  indebtod 
to  Doctor  S.  Greenhouse  and  Doctor  S.  Gcissor  of  the  National  Institutes 
of  Health,  for  suggesting  this  possibility  and  for  their  advico  in  the 
construction  of  a robust  analog  of  J.)  Lot  us  soo  what  can  bo  dono 
along  this  lino. 

Tho  second  assumption  guarantoos,  through  tho  Control  Limit  Thcoron, 
that  all  wo  have  to  do  is  ascertain  tho  oxpoctod  values  of  tho  moan  and 


Lubin 


17 


variance  of  S^(d2)  for  each  subject. 

Let  us  examine  the  distribution  of  S^(d2)  that  results  ™hen  the 
second  assumption  docs  not  hold.  Since  the  ELP  assumption  results  in 
a rectangular  distribution  where  each  possible  rank-order  has  a probability 
of  “tthe  effect  of  deviations  from  this  assumption  will  bo  to  make  the 
probabilities  for  the  various  rank-orders  unequal.  If  all  the  probability 
is  concentrated  in  one  rank-order,  the  variance  of  S^(d2)  will,  of  courso, 
be  zero. 

Let  us  assume  that  the  distribution  of  probabilities  is  such  that 

the  oxpccted  moan  S^(d2)  remains  I(n3-n),  i.o,,  there  is  no  trend.  The 

6 

maximum  variance  for  S^(d2)  oecurs  when  all  the  probabilities  are  concen- 
trated equally  on  the  minimum  and  maximum  values  of  S^d2),  zero  and 
^(n3-n),  (I  am  indebted  to  5.  Goissor  for  a proof  of  this.)  Thon  tho 

variance  associated  with  oach  subject  is  2(n3-n) 

6 

If  wo  have  m subjects  whoso  mean  S^(d2),  on  tho  null  hypothosis, 
is  (n3-n)/6,  then  tho  expected  mean  of  J romains  m(n3-n;/6  but  the  maxi- 
mum varianco  of  J is  m(n3-n}2/6. 


J-jr(n3-n)  +1 

(£)  g = 

i(n3-n)V  m 

represents  a tost  of  the  assumption  that  tho  averago  rank-ordor  correla- 
tion with  the  a priori  rank-order  is  zero  for  each  subject  whoro  the  n 
treatment  distributions  may  have  any  shapo  or  any  kind  of  dependence  on 
one  another.  It  is,  howevor,  an  oxtremoly  weak  tost  and  requires  many 
moro  subjects  than  J doos  for  significance. 


-hat  are  the  alternative  hypotheses  against  which  g,  the  robust 
analog  of  the  J test,  should  be  used?  This  is  difficult  to  answer 
exactly.  Suppose  to  have  n treatments  each  of  Tjliieh  gives  rise  to  a 
symmetric  distribution.  T'hen  the  distributions  are  identical,  with 
constant  dependency,  the  J test  will  be  appropriate.  But  this  is  almost 
equal  to  the  conditions  that  must  be  met  for  analysis-of -variance.  T'hen 
the  distributions  are  not  identical,  when  the  dependencies  are  not 
constant,  then  the  g test  will  bo  safe. 

It  is  possible,  as  to  havo  mentioned  previously,  to  have  n treatment 
distributions  such  that  the  population  moans  aro  equal  but  the  oxpectcd 
value  of  Spearman's  rho  is  non-zero.  In  other  words,  neither  J nor  its 
robust  analog,  g,  is  primarily  a test  for  the  oquality  of  the  n corrclatod 
means.  They  arc  tests  for  trond  which  weight  each  subject's  contribution 
equally. 

Another  anomalous  situation  for  the  J test  would  bo  tho  caso  whero 
there  arc  two  or  moro  distinct  groups  of  subjects  and  the  n treatments 
may  be  crpcctcd  to  oxhibit  a different  trend  in  each  group.  (This  is 
analogous  to  a significant  block-by-trcatmcnt  interaction  in  the  two-way 
analysis-of -variance.)  As  long  as  the  oxpoctod  value  of  Spearman's  rho 
is  positive  for  every  subject,  the  robust  test,  g,  will  be  appropriate. 
However,  if  the  expected  valuos  arc  positive  and  negative,  I know  of 
no  single  simple  test  of  significance. 


\ 

A statistic,  J, 


Summary 

has  been  proposed  as  a tost  for  the  differences 


between  a set  of  n corrolatod  moans  whon  m subjects  havo  been  subjected 


7* 


lubin 


to  n treatments.  The  null  hypothesis  is  that  all  permutations  of  tho  n 
scores  for  each  subject  arc  equally  likely,  Tho  alternative  hypothesis 
is  that  thoro  is  a trond  which  can  bo  spocifiod  by  the  experimenter  in 
the  form  of  a tho  rotical  ranlc-ordor,  Tho  J statistic  is  the  sum  of 

tho  3 («*)  valuo3  that  can  be  computod  between  the  hypothetical  rank-order 

^ i-  P-A 

and  the  observed  rank-order  for  each  of  tho  m subjects.  It  i3  therefore 
equivalent  to  an  average  Spearman  rank-order  correlation.  There  is  no 
parametric  test  that  can  be  substituted  for  the  non-parametric  trend  test. 

f! 

The  distribution  of  J has  been  shown  to  be  nearly  normal  for  the 
cases  where  can '^•12,  and  distribution  tables  have  been  given  for  those 
cases  where  the  normal  curve  docs  not  give  an  adequate  fit, 

Tho  J test  is  essentially  identical  with  Jonckhooro’a  P test  for 
trond  except  that  it  is  besod  on  Spoorman’s  rho  rather  than  Kendall's 
tau.  It  is  suggested  that  tho  greater  sensitivity  of  rho  usually  loads 
to  equal  or  groator  power  for  J as  compared  to  P, 

The  basic  assumption,  that  all  permutations  of  a subject's  3ct  of 
n scores  aro  equally  likely,  implies  oevero  restrictions  on  the  n-variatc 
distribution  of  raw  scores.  Theso  restrictions  arc  almost  equivalent 
to  those  for  analysis  of  varianco.  A preliminary  tost  is  3uggcstod  to 
indicato  when  the  significance  of  J may  bo  duo  to  deviations  from  tho 
"equally  likely  permutations"  assumption,  rathor  than  to  difforoncos 
between  the  n correlated  moans.  A robust  (but  woak)  analog  of  J is 
proposed  for  those  eases  whore  this  assumption  docs  not  hold. 


In3ort  Tablo  7 horo 


r 


[ 


Lubin 


Tabic  1 

Nunbor  of  Errors  Lladc  in  Succcssivo  Looming  Trials 


Subject 

1 

2 

3 

4 

5 


7 

0 

2 

1 

0 

4 


6 

4 

0 

2 

7 

3 


5 

10 

7 
9 
5 

8 


Trial 

4 

8 

6 

8 

10 

2 


3 

5 

8 

5 

6 
9 


2 

3 

4 

4 
11 

5 


21 


1 

12 

5 

10 

9 

7 


Total 


16  39  34  33  27  43 


Tabic  2 

Jonckhccrc1 s Example  t Rank-Order  of  Nunbor  of  Errors 
Made  in  Each  Trial  of  a Learning  Experiment 


Theoretical 

Rank-Order 

1 

2 

3 

Trial 

4 

5 

•;  6 

7 

Si(d2) 

pi 

Subject 

1 

1 

3 

6 

5 

4 

2 

7 

28 

14 

2 

2 

1 

6 

5 

7 

3 

4 

34 

13 

3 

1 

2 

6 

5 

4 

3 

7 

20 

15 

4 

1 

4 

2 

6 

3 

7 

5 

18 

16 

5 

3 

2 

6 

1 

7 

4 

5 

34 

13 

Sun 

134 

71 

L’lbirt 


Seconds  of  Alpha  during  Sloop  Loss 


Subjects 


Lubin 


.nlc-Ordcr  of  Seconds  of  Alpha  during  Sloop  Losi 


Theoretical 


Subject 


* 

\ 

* « 

* Lubin 

i 

27 

Tc 

.blc  7 

Cumulative  Probability  Distribution  of  J 

1 
• j 

n=.  3 

n - 

4 

n = 5 

n-6 

r 1 

j 

n = 2 n = 3 

rn=  4 

n = 2 

n -3  ; 

n=  2 

n- 2 

0 

.028  .005 

.001 

.002 

.000  ' 

.000 

J 

.000 

2 

.139  .032 

.007 

.012 

.001 

.001 

.000 

4 

.250  .088 

.025 

.031 

.003 

.002 

.000 

6 

.361  .153 

.056 

.056 

.007 

.005 

.000 

8 

.639  .278 

.109 

.106 

..015 

.010 

.001 

10 

.750  .444 

.201 

.148 

.028 

.017 

.001 

12 

.856  .555 

.306 

.210 

.045 

.026 

.002 

14 

.972  .722 

.434 

.281 

.070 

.038 

.003 

16 

1.000  .847 

.577 

.366 

.102 

.055 

.005 

18 

.912 

.701 

.443 

.143 

.073 

.007 

20 

.968 

.799 

.557 

.191 

.096 

.010 

22 

.995 

.893 

.634 

.253  • 

.123 

.014 

24 

1.000 

.944 

.720 

.315  • 

.156 

.019 

26 

.975 

.790 

.390  j 

.190 

.024 

28 

.993 

.852 

.462  ! 

.231 

.031 

30 

.997 

.894 

.538 

.272 

.039 

32 

1.000 

.944 

.610 

.320 

.048 

34 

.968 

.685  • 

.368 

.059 

36 

.988 

.747 

.421 

.071 

38 

.998 

.809 

.471 

.085 

40 

1.000 

.857 

.529 

.101 

42 

.898 

.579 

.118 

44 

.930 

.632 

• .137 

46 

.955 

.680 

.158 

48 

.972 

.728 

.181 

50 

.985 

.769 

.205 

52 

.993 

.810 

.231 

i 

54 

.997 

.844 

.257 

56 

.999 

.877 

.287 

58 

1.000 

.904 

.318 

60 

1.000 

.927 

.349 

62 

i 

.946 

.381 

64 

.962 

. .415 

66 

I 

1 

.974 

.448 

68 

.983 

.482 

70 

i 

.990 

.518 

* 

* 

*Tho  remaining  probabilities 

nro  onitted  because  of 

lack  of 

spaoo 

but 

can  be  obtained  by  symmetry. 

